Reproducible reporting

An introduction to Quarto

Division of Pharmacoepidemiology and Pharmacoeconomics
Brigham and Women’s Hospital
Harvard Medical School

August 25, 2024

Problem statement

Wait, but how was that done exactly?

Problem statement (i)

Wait, but how was that done exactly?

  • More often than not, statistical and computational methods are reported and phrased ambiguously, e.g.,

    “We measured the pre-exposure performance status within 90 days of the index date.”

  • Does the 90-day window include or exclude the index date? What was done if there were multiple performance assessments per patient? …

  • Take a moment and reflect if you would be able to exactly reproduce a study you published 10 years just based on the paper’s methods section?

Problem statement (ii)

Wait, but how was that done exactly?

One could find the details in the analytical programming code, BUT…

Is there a reproducibility crisis?

Nature survey: More than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments (Baker 2016)

What if…

What if…

If there was just a way to combine…

  • the narrative prose that explains the methods used

  • the analytic code we implemented to execute these methods

  • the corresponding results

…all in one report?

Literate programming

Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do (Donald Knuth, Turing Award recipient)

Definition

It is basically an annotated, executable manuscript!

Literate programming

Programming paradigm introduced in 1984 by Donald Knuth in which a computer program is given as an explanation of how it works in a natural language, such as English, interspersed (embedded) with snippets of macros and traditional source code, from which compilable source code can be generated.

(Knuth 1984)

In other words…

\[ \text{Literate programming} = \text{Documentation + Source Code + Output/Results} \]

History of literate programming

  • Literate programming is a concept pioneered by Donald Knuth, a Turing Award recipient known for creating TeX.

  • The main idea behind the early form of literate programming was to upend the traditional programming practices of the time by systematically including human readable text accompanying and explaining the logic and the purpose of a program.

  • As he describes in “Literate Programming”, Knuth considers the programmer as an “essayist” who should strive to communicate the purpose of a program in order to create better code.

  • While initially centered in the domain of computer science, it more recently resurged in the interdisciplinary world of “data science”.

https://bernhardbieri.ch/blog/2022-08-25-litteralprogramminginstata/

Document complexity of technical report

Strengths and weaknesses of technical reporting systems

Example

Methods section text:

“A propensity score model for exposure initiation was fit using logistic regression with age, sex and smoking as covariates. Patients were matched using nearest neighbor matching on the propensity score in a 1:1 ratio without replacement targeting the average treatment effect among the treated (ATT).”

MatchIt::matchit(
  formula = exposure ~ age_num + female_cat + smoking_cat,
  data = smdi::smdi_data,
  ratio = 1,
  method = "nearest",
  distance = "glm",
  link = "logit",
  estimand = "ATT",
  replace = F
  )
A matchit object
 - method: 1:1 nearest neighbor matching without replacement
 - distance: Propensity score
             - estimated with logistic regression
 - number of obs.: 2500 (original), 1996 (matched)
 - target estimand: ATT
 - covariates: age_num, female_cat, smoking_cat

Goal: single source publishing

Introduction to Quarto

Examples

Reproducible projects and manuscripts

References

Baker, Monya. 2016. “1,500 Scientists Lift the Lid on Reproducibility.” Nature 533 (7604): 452–54. https://doi.org/10.1038/533452a.
Knuth, Donald Ervin. 1984. “Literate Programming.” The Computer Journal 27 (2): 97–111.